Small-sample precision of ROC-related estimates
نویسندگان
چکیده
MOTIVATION The receiver operator characteristic (ROC) curves are commonly used in biomedical applications to judge the performance of a discriminant across varying decision thresholds. The estimated ROC curve depends on the true positive rate (TPR) and false positive rate (FPR), with the key metric being the area under the curve (AUC). With small samples these rates need to be estimated from the training data, so a natural question arises: How well do the estimates of the AUC, TPR and FPR compare with the true metrics? RESULTS Through a simulation study using data models and analysis of real microarray data, we show that (i) for small samples the root mean square differences of the estimated and true metrics are considerable; (ii) even for large samples, there is only weak correlation between the true and estimated metrics; and (iii) generally, there is weak regression of the true metric on the estimated metric. For classification rules, we consider linear discriminant analysis, linear support vector machine (SVM) and radial basis function SVM. For error estimation, we consider resubstitution, three kinds of cross-validation and bootstrap. Using resampling, we show the unreliability of some published ROC results. AVAILABILITY Companion web site at http://compbio.tgen.org/paper_supp/ROC/roc.html CONTACT [email protected].
منابع مشابه
Some New Developments in Small Area Estimation
Small area estimation has received a lot of attention in recent years due to growing demand for reliable small area statistics. Traditional area-specific estimators may not provide adequate precision because sample sizes in small areas are seldom large enough. This makes it necessary to employ indirect estimators based on linking models. Basic area level and unit level models have been extensiv...
متن کاملCut-off Sampling Design: Take all, Take Some, and Take None
Extended Abstract. Sampling is the process of selecting units (e.g., people, organizations) from a population of interest so that by studying the sample we may fairly generalize our results back to the population from which they were chosen. To draw a sample from the underlying population, a variety of sampling methods can be employed, individually or in combination. Cut-off sampling is a pr...
متن کاملArea specific confidence intervals for a small area mean under the Fay-Herriot model
‎Small area estimates have received much attention from both private and public sectors due to the growing demand for effective planning of health services‎, ‎apportioning of government funds and policy and decision making‎. ‎Surveys are generally designed to give representative estimates at national or district level‎, ‎but estimates of variables of interest are oft...
متن کاملCan measures of sound localization acuity be related to the precision of absolute location estimates?
Studies of sound localization use relative or absolute psychoacoustic paradigms. Relative tasks assess acuity by determining the smallest angle separating two sources that subjects can discriminate, the minimum audible angle (MAA), whereas absolute tasks measure subjects' abilities to indicate sound location. It is unclear whether or how measures from the two tasks are related, though the belie...
متن کاملSmall Area Variance Modeling with Application to County Poverty Estimates from the American Community Survey
Variances in the American Community Survey are estimated using a replicate weight methodology (Fay, 1995). In counties with small sample sizes, the variance estimates of poverty statistics show wide variation as a function of sample size. Generalized Variance Functions (GVF) can be used to smooth out the uncertainty of the design-based variance estimate. We propose incorporating GVFs with small...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 26 6 شماره
صفحات -
تاریخ انتشار 2010